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translation invariant measures on K called algebraic measures 
to study the entropy rate of a hidden Markov processes. Under 
some irreducibility assumptions of the Markov transition matrix 
we derive exact formulas for the entropy rate of a general q state 
hidden Markov process derived from a Markov source corrupted 
by a specific noise model. We obtain upper bounds on the error 
when using an approximation to the formulas and numerically 
compute the entropy rates of two and three state hidden Markov 
models. 
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I. Introduction 

IN this paper we study the entropy rate of a hidden Markov 
process using a class of translation-invariant measures on 
the chain K z where K = {0, ....,(? — 1}. These measures 
known as manifestly positive algebraic measures and their 
properties were first studied in JTJ. A one to one corre- 
spondence was shown in |fl] a between manifestly positive 
algebraic measures and hidden Markov process. We use the 
properties of the algebraic measures to give formulas for the 
entropy rate of a hidden Markov process derived from a certain 
noise model that we will describe later. 

In information theory one models an information source as 
a stochastic process {Xi}°^ 1 with each a random variable 
taking values in the alphabet set K = {0,...,<7 — 1}. The 
Shannon entropy of a random variable X taking values in a 
set K is defined as 



S(X) = - Vp(z)log 2 Kz) 



(1) 



The entropy of the the source for the first n transmitted 
symbols is given by the joint entropy of Xi, X n 

S n (Xi,X 2) ...,X n ) = - ^2 p(x 1 ,...,x n )\og 2 (x 1 ,...,x n ) 

X\,.. .,X n 

(2) 

The entropy rate of the source is defined by 

S n {Xi 7 X2, X n ) 



lim 

n— too 



(3) 



where /i is the measure associated with the sequence of ran- 
dom variables. This limit exits when {X n } forms a stationary 
stochastic process. Entropy rate is an important quantity in 
information theory as it is a measure of the average amount 
of information per symbol of a stochastic process. There 
is a well known formula for the entropy rate of a Markov 
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source. Hidden Markov chains can be thought of as a noisy 
observation Markov source emitting a sequence of symbols. 
Hidden Markov models have been extensively studied and 
the statistical methods based on hidden Markov models have 
been successfully applied in diverse fields such as speech 
recognition, image analysis and restoration, DNA sequencing, 
communication and information theory. Even though extensive 
research has been carried out on hidden Markov models 
12, 0, the problem of deriving explicit expressions for the 
entropy rate of the in terms of the parameters of the process 
is still an open issue. 

The entropy rate of a hidden Markov process was first 
studied by Blackwell in 1957 4 who showed that the entropy 
rate is given by the integral 



H(ji) 



S(w)(j)(dw) 



(4) 



vv 



where S is the Shannon entropy and w belongs to the simplex 
W = {w = (wi,W2, •••) Wq) '■ w% > 0, Y^, w i = 1} is a <ft(dw) 
is a particular measure called the Blackwell measure on the 
simplex W. Recently there has been a renewed interest in 
problem of calculating the entropy rate of a hidden Markov 
chain. The papers 0, (6) showed a connection between the 
entropy rate and top Lyapunov exponent of a product of 
random matrices. The study of entropy rate in the context of 
filtering and denoising was done in Q, (8). The study of the 
asymptotic behavior, the smoothness and analytic properties 
and obtaining new bound and improved bounds of the entropy 
rate in terms of the process parameters is carried out in [0, 
[10 1, ifTTl . JT2). Calculation of entropy rate based on ideas 
from statistical mechanics is done in ft\3\ . In this paper we 
follow the approach of [1|, wherein a formula similar to (3) 
was derived. Moreover, in 0] the support of the Blackwell 
measure was explicitly characterized. We see that in the case 
of the particular noise model that we study in this paper 
the support of the measure <fr(dw) simplifies and leads to an 
analytic solution of the entropy rate. 

In section III- Al we review some key results on algebraic 
measures from [1] that we will use in this paper. The descrip- 
tion of the noise model and the support of the measure <fi(dw) 
is computed in in section Hill The main theorem about the 
formula for the entropy rate of the hidden Markov process 
is proved in [IV] Lastly in section [V] we show numerical 
computations of the entropy rate using approximations to the 
formulas obtained in section HVl 

II. Setup 

A. Manifestly positive algebraic measures 

Let q £ No and K = {0, 1, q— 1} and consider the chain 
fl = K 7 ' consisting of configurations 10 — (..., Wj, ...) 
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with each W{ G K. Let Tk be the sigma algebra generated 
by all the cylinder sets. Let T : K - K be the shift 
transformation given by 



Toj = 5 where <L 



U! 



n+1 



A measure p on £7 is called translation invariant if 

/i(-E) = p{T~ l E) VE g 

In HI a special class of translation-invariant measures called 
algebraic measures was constructed on the chain 51 in terms 
of a triplet (U,p, (E a ) aeK ) where 

■ U is a real ordered algebra (the order determined by a 

convex cone 
• p is a positive linear functional on U. 
. E a eU+ satisfying p(AE) = p(EA) = p(A)V, AeU, 

p(E) = 1 where E = E„ ex E a . 

We state the following proposition from [TJ 

Proposition II. 1: Given the triplet (E7, p, (E a ) a& K) there 
exists a unique translation invariant probability measure p on 
K z such that 



p(uj b 



p(E UJm ,E u , ...,23^ 



where w [m ,„] = (w TO ,w m+ i, ...,«„), 

The measure /i was called an algebraic measure and it 
was shown that if U is finite dimensional the triplet 
(U, p,(E a ) a EK) has a unique representation in a finite di- 
mensional vector space. Let (M d ) + denote the positive cone 
in M. d of vectors with all its components non-negative and let 
(Md) + be the matrices that preserve this positive cone i.e. the 
all the elements of are non-negative. An algebraic measure p 
is called manifestly positive if there exists a d G N positive 
r, a G and for each a G K, a positive E a G Mf such 

that 



p{{ljj m , Um+l, •••) W n )) = (tIE 1 ^^ 



Two important examples of manifestly positive algebraic mea- 
sures are Markov chains and hidden Markov models. 

1) Markov chains: Let {Xj}j e pj be a Markov chain taking 
values in K — {0, ...,q — 1} and having stationary measure p. 
Let 

« a G (M 9 ) + is the vector with all components equal to 1. 

• r G (K 9 ) + such that r a = p((a)) i.e. the a th component 
of the stationary distribution r. 

. E a G M+ : (E a ) biC = 5 a , 6 ^g^ for all a, 6, c G K is 
the matrix with the only non-vanishing row to be the a th 
row of the transition matrix E = J^aeK-Ea- 

Then it is easy to see that 



p,((w m ,...,w n )) = (t\E u 

E*T = T 

Ea = (7 



■E Wn l) 



2) Hidden Markov models: In pQ a one-one correspondence 
was shown between functions of Markov processes (hidden 
Markov models) and the class of manifestly positive algebraic 
measures. Let X — {Xj}j e pj be a Markov chain that takes val- 
ues on a finite alphabet L with transition matrix E stationary 
measure v. Let F a be the matrix with the only non-vanishing 
row equal to the the a th of E. Thus 



E 



F„ 



(5) 



We can represent the measure v as in example 1 by the triplet 
(t, 1, (F a ) a gi,). Let Y — {Yi}^] be a noisy observation of 
the Markov chain with values in K = {0,1, ...,q — 1}. Define 
the matrix R = [rat,] with r (, = Pr[Yi = a\Xi = b] and let 



(6) 



b£L 



It is easy to verify that the manifestly positive representation of 
the stationary measure p associated with Y is (r, a, (E a ) ae K). 
So that 



p((w m ,...,w n )) = (T\E Wm ...E Wn a) 



(7) 



where r a = p(a) and a is a vector with all components equal 
to 1. 

B. Entropy rate of manifestly positive algebraic measures 

It was shown that under certain irreducibility conditions 
(see Condition 1 given below) the mean entropy rate of 
a manifestly positive algebraic measure can be computed 
as an integral with respect to a measure on the simplex 
W = {w = (wi,W2, w q ) : Wi > 0, >~2 w i — 1} similar to 
the Blackwell measure. 
Condition I: 

i. There exists a c > such that for all a,b G K E a E\, > 
cE a . 

ii. There exists an oo 6 if such that the invariant subspace 
corresponding to the largest eigenvalue of E ao is one 
dimensional and all other eigenvalues of E ao have strictly 
smaller modulus. 

iii. E is irreducible i.e. the invariant subspace corresponding 
to the largest eigenvalue 1 of E is one dimensional . 

We have the following theorem for the entropy rate of a 
function of Markov process. 

Theorem II. 2 (fiTV)-' The mean entropy rate H(p) of a man- 
ifestly positive algebraic measure p (satisfying Condition 1) 
is given by 



H(p) = Yl / K(w)^{dw 



aEK 



(8) 



where h a (w) = —(w\E a a)log(w\E a a) and <j>(dw) is a 
probability measure on the simplex W. 
In [TJ an equation for <fi(dw) is derived in terms of a Markov 
operator T M on C(W), the space of continuous functions on 
the simplex W. In addition the support of the measure is also 
characterized. For functions of Markov processes Blackwell 
lIU obtained a formula similar to equation dH) however there 
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was no clear connection of the measure with the Markov 
operator and the support of the measure was also not explicitly 
characterized. The Markov operator T M can be described in 
terms of the collection (E a \a G K) as follows. We first define 
r Q : Wo -> W with W = W U {0} by 

otherwise 



(9) 



Let Co (Wo) be the set of continuous real valued functions that 
vanish on 0, then T p : C (W ) ->• C (W ) is defined by 



(T,J)(w) = £)(«;|£ o£ r>/(I» 



(10) 



By the Riesz-Markov theorem (Theorem IV. 14 [ 14|) there is a 
one to one correspondence between the space of measures and 
normalized positive linear functionals on C(W). Therefore 



</>(/) = / f{w)4>(dw) f g C(W) 
Jyv 



(11) 



In UJ the measure <fi(dw) was characterized as the unique 
measure on C(W) that is invariant under T M i.e. 

<t>(f) = 0(T A1 /)/eC(W) 

The support of <j>(dw) is given by 



(12) 



supp(<£) = A = {T {lij) f\uj eK n ,ne N} (13) 
T Uo and f is the only non-trivial fixed 



where r (w) = r a)n _ 1 
point of T ao . 



III. Noise model and support of the measure <j> 

The entropy rate formulas derived in this paper are for 
a general q state hidden Markov process described by a 
particular noise model. In this section we describe the noise 
model and the support of the measure <fi(dw) given by equation 
(fT3> for this noise model. The noise model that we work 
assumes that noise does not affect exactly one of the input 
symbols, say the symbol 0. If the symbol is transmitted then 
it is always unambiguously received at the other end. On the 
other hand if any of the other symbol is transmitted then it is 
either received without any error or received as the symbol 
with a small error probability. That is P{Yi = 0|Xj = 0) = 1, 
P(K, = 0\X % = a) = e a and P(Y t = a\X t = a) = 1 - e a for 
a = 1, q - 1 and P{Y t = 6|JQ = a) = when ^ b ^ a. 
See figure Q] for a description of the model in the special case 
of q = 3. In this paper without loss of generality we shall 
assume that the unambiguous symbol is 0. Let the matrices 
{F a } be the matrices that describe the uncorrupted Markov 
source as in equation (0. For this noise model we write the 
matrices {E a } given by equation © as 



En — Fn 
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£ 

a=l 



e a F a 



(14) 



E a - (l-e a )F a fora=l,. 

E n = E 



E 

Let ej denote the transpose of the (i — l) st row of E = [e^]. 







U " 














i-e 2 







Fig. 1. The noise model for q = 3. If is transmitted is received with 
probability 1. If 1 of 2 are transmitted there is an error probability of e± and 
E2 respectively of the received symbol being 0. 



will be 

Assumption 1: 

< p < P < 1, e = 1 and e a > Va G {1, q - 1}. 
From equation (O one can see that for each v G W 



9-1 



(15) 



E9— 1 
a=0 Ca!/a a=0 

ELo a « = 1 with a « = 

Because of Assumption 1 we get 

P < (r ^) Q < P Va G A". (16) 
Also one gets from equation (O for all a G{1, ...,<? — 1}. 
e a if v a 7^ 



otherwise 



(17) 



A key factor that simplifies our analysis of the entropy rate 
is that the support of cp(dw) given by equation ( fT3l simplifies 
significantly with this noise model. We have the following 
proposition about the support of the measure <fi. 

Proposition III.l: The support of the measure <f> is given by 



A - {Ttfejlj G {1, .., q - 1}; m e N } (18) 

Proof: We know from equation ( [TBI that the support of 
the measure <\> is given by 



A = {F( u )f\u> G K n , n G N} 



where f is the non-trivial fixed point of Tq- Since by equation 
( fTBT ) f G it follows from equation (Qj]i that r a f = e Q 

and T a e a — e a for a ^ 0. Therefore 



A = {f } U {T^ ej |j G {1, .., q ~ 1}; m G No} 

One can show ([1| equation (28)) that for any 1^ G W + there 
exists constants Ci and p < 1 such that 

lirg+^-fiu <c lP fc (19) 

This implies that all subsequences in A converge to f and 
hence f is the only accumulation point in A. Therefore 



{rp ej -|je{i,.. J? -i};meNo} 



Let p 



and P = max, 



Our only assumption 
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Moreover for any j G {1, q — 1} 



So, 



lim T™e 3 

rn— >oo 



(20) 



We note that A may consist finite or infinitely many distinct 
points in W. For instance in the case if eo = e± = ■ ■ ■ = e 9 _i 
then it is easy to see that f = eo = ■ ■ ■ = e q -\ and A consists 
of a single element i.e. A = {eo}. It is clear from ([19) that 
if A is a finite set then f = T^ej for some n E N and some 
j G {1, q — 1}. On the other hand A can be countably 
infinite as can be seen from the following lemma. 

Lemma 111.2: If Eq is one to one then A is countably 
infinite set of distinct elements. 

Proof: We will first show that Eq is one-one => T : 
W —> W is one-one. 



then 



Therefore El 



If I> 

(v\E <j) 
{v - Crf) 



T T] for v,r\ G W 

(V\E <?) 




where C = £#4. 



But Eq is one-one implies 

v = Cr) 



and since v,r\ G W therefore C = 1 so v = r/. Since 
_Eq is one-one the vectors e a G W, a G K form a linearly 
independent set. We will show that all the elements of A are 
distinct i.e. for any to, n > 



1 e J 



+ rfo 

7^ r 



(21) 



Assume r™^ = T^e,, for some m, n > and for i,j G 
{1, q — 1}, If m and n are both zero then ei — ej but this 
contradicts the fact that e/s form a linearly independent set. 
Assume wlog that to > n. Since Tq is one-one we arrive at 
r§ei = ej for k = m — n. Again using equation ( fT5b we arrive 
at a contradiction that e/s form a linearly independent set. 

From above, r e a ^ e a for any a G K so e a 7^ f. If 
T^ea = f for some to > 1 then r (r o ™^ 1 e a ) = f. Since f 
is a fixed point of To and Tq is one to one we conclude that 
(r o ra_1 )e a = f . Repeating the argument we get e a — f which 
again is a contradiction. ■ 

Lemma III. 3: If A is countably infinite then 0(f) = 
Proof: Since A is countably infinite there exists a j £ 



{1, q — 1} such that lim,, 



r. Consider the set 



A,.„, = {T^ej\k > to} 



for m G {1,2,...}. This is a decreasing sequence of sets with 

Therefore 0(f) = lim ^(A,- m ) 
Let /„, G C(W) be defined so that 



fm{v) 



1 ^ G Aj >m 

otherwise 



0(A i>m ) = / f m (v)d(j) 
Ja 

By (0 and (O we get 
Therefore we get 



' aeK 



0(A j 



(v\E a)dcj> 



A m _i 



where we used (fT7|) and the fact that To^ G A m only if 
v G A„,_i. We have 



(v\E a)d<t> 



A m _i 



Since ^ a = 1, £o = 1 an d e a < 1 for a 7^ 

9-1 



a=0 



So, 



Therefore, 



<t>{^j,m) < r0(Aj, TO _i) 
<MA,, m ) < r m -V(Ai) 



lim 0(Aj- m ) < lim r m "V(Ai) = 0. 



The next lemma shows that our Assumption 1 on the Markov 
transition matrix and the noise parameters is enough to satisfy 
Condition 1. 

Lemma III.4: lfO<p<P<l and e a > for all a G K 
then the matrices (E a ) a( zK satisfy Condition 1. 

Proof: Condition 1 i. can be verified by a simple compu- 
tation by choosing c = ep 9 ^ 1 where e = mirij ej. ii. and iii. 
follow from the Perron-Frobenius theorem and the fact that E 
is a Markov transition matrix. ■ 

IV. Entropy rate formulas for a hidden Markov 
process 

In this section we apply the results on manifestly positive 
algebraic measures in IH-AI to the case of the hidden Markov 
model described in section [HI] We divide our support set A 
into disjoint sets in the following way 



A, 



Ai 
A 2 
A 3 

9-1 



{r™ ei : m G N } 
{r™e 2 : to G N }\{Ax} 
{r^ea :meN }\{AiUA 2 } 



q-2 



{ro%_a :mGN }\{|J AJ 
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Define 



Using ((27]) we get 



c j , m = Y[(T™- i e j \E a) 



(22) 



1=1 

Let A be the gxij-l matrix defined by entries. 
If i 7^ q then 

A = \-5i j +T, l £i l o{riS l e j \E i a}c j , m q^2 
13 lO if g = 2 



I A, | 

m=0 

$ = [^(ei),...,^(e,_i)]' G 
b = [0,0,..,l]'eR« 



p«-i 



(23) 

(24) 
(25) 



Theorem IV. I : Under Assumption 1 the entropy rate of the 
measure /x associated with the hidden Markov process with 
the noise model described in section [III] is given by 

q-1 |A 3 -| q-1 

H M = E E E WVoeiHm*, (26) 

j — l rn—0 a—0 

where is the j fh coordinate of the solution to the matrix 
equation A$ = b. 
Proof: Let 



1 ifz/ = r^e j 



otherwise 
for m G N and j = 1, g — 1. We have 

Jw 

By ([Toll and CUl we get 



w 



a£K 



But r o (i/) = e a and r ^ = r™e 3 only if v = T^~ x ej 
therefore 

0(r™ e ,) = (r-- 1 ej -|£ o ^)0(rr 1 ei) 

and iterating we get 

(f>(T™e 3 ) = c,, m 0(e,) (27) 

m 

where Cj, m = Y[(T™- l e 3 \E a) 

i=l 

Similarly we use the functions /^o to solve for 
0(ei), ...0(e g _i). 

0( e O = / ^ (i/|^ <r)/j,o(r - ai/)dfo 

Jw a£K 

!/eA 

J=l rn— 
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0(e,) = ^0( e ,)E^l^ f7 ) c ^ 

j'=l m— 

Now the measure of the whole set 0(A) — 1 so 



(28) 



9— 1 |Aj (j — 1 oo 

j = l m=0 j — 1 m=0 



Equations (1281 1 and (|29] l form a system of g linear equations 
A$ = 6 with A G M 9i9 _x, $ G W- 1 and 6 G R q as 
given by ([23),(|24j,(|25j. By lemma dlV2l J2m=a c 3^ < oo 
which ensures that each A- l3 is bounded. Now from the integral 
formula for the entropy rate given by d§) and the support of 
the measure given by proposition ( IIII.lt and we get that 

9-1 I A rf | q-1 
j — l rn—0 a—0 

From d27| > we get 

9-1 I Ai| q-1 

j—l m—0 a—0 
q-1 |A 3 -| q-1 

H (p) = EEE^w^)^™^ 

j — l m—0 a—0 



For the case when Eq is one-one from lemma IIII.2I and 
equation (f2~TT > it follows that 

<? — l oo 9—1 
j = l m=0 a=0 

Lemma TV.2: For all j = 1, g — 1 and m G No 
Cj,m < 7™ where 

9-1 

7 = maxsupVe a [rgej] a < 1 (30) 

1 i. * * 



a=0 



Proof: 



Ji^-^l^cr) (31) 

i=l 

m 9 — 1 

riE^ro 







8=1 a=0 



We know by ( fT6b p < [r§ej] Q < P for each a E K. Since 

r fe ej G W, S Q ei<-P fce i]a = Now, by Assumption 1. e a < 1 
if a G {1, .., g — 1} and e = L 

9-1 

So ^e Q [r fe e,] Q <l VkeNo 



Moreover by (|20]i 



lim rgej = f G W H 
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Therefore, 



and 



9-1 

sup^e [T§ej] < 1 



a=0 



9-1 

7 = maxsupy^e a [roej] a < 1 

3 k „ 
a— 



We also prove the following bound for Cj m which will be 
useful in the numerical estimates in section IVl 

Lemma IV.3: Let e = max„g{i e a - If e < f> then 



< r m where 

= l-(q-l)(p-eP) <1 (32) 
fW/: e < p =S> r = (1 - (q - l)p + (q - l)eP) < 1. 



r 



9-1 



E e 4C 



i=l a— 1 

m 

< IICL-fa-iJp + te-i)^) 



Substituting ([35) 

i$ - i$ = 6 - i?$ - i$ 
= 6-i?$-A<l 

||$-$||i = Ili^-i^^-^H 
||#-$||i < HitlliP'flli 
But since consists of the tail of the entries each entry in R 

JV + l 

is bounded by YZ=n+i 7™ = t^- 
/l ••• 1> 



|iZ$|| < 



1-7 



< 

1 

.4+ 



yv+i^ 

1-7 



< ll^llill^lli < ^-7^+' 
1 -7 



Theorem V.2: Under Assumption 1 the entropy rate of the 
measure \x associated with the hidden Markov process de- 
scribed in section HII] can be approximated to 0(-f N+1 ) by 

9-1 N 9-1 
j — l rn—O a— 



Proof: 



err(iV) 



\H(p)-H N {y)\ 



V. Approximating the entropy rate 

In this section we present some numerics approximations to 
the entropy rate formulas that were derived in section JV] We 
show that if we take only the first TV terms of the matrix A 
given by d23l and d24l) for the entropy rate calculations then 
this gives an approximation of order 0('f N+1 ) where is given 
by 

Let 



A = A + R 



(33) 



where the entries of R are the N th tails of the entries of A. 
Let $ be the least square solution to 



err(iV) 



g — 1 oo 9—1 

E E Y, h *F°ej)cj, m *i + 

j=l m=N+l a=0 
9—1 oo 9—1 

j—l rn—O a— 

q— 1 oo q—X oo 

<«E E T^i+gEE^-^ 

j=l m=iV+l i=l m=0 

= 9!— ||$|| 1 + ?T _||$-$ i || 1 

By lemma rVTl 



< 



1_ 7 ' 

1-7 1-7 V 1-T j 



Therefore 



A$ = b 



(34) 

Therefore 



(35) 



err(/V) < q 



where A* is the pseudo-inverse of A. We first prove the 
following lemma 
Lemma V.l: 



1 1 

1-7 V 

0{l N+1 ) 



1-7 ' 



(36) 



1$ - $|h < 



q\\&\ 



1-7 

Proof: From d34i l and d33l 

A$ = i$ + = b 



Next, we present a numerical example for approximating 
the entropy rate formulas given by theorem IV. 21 We can get 
estimates on err(iV) using the bound d36l >. but 7 is difficult 
to compute. However, note that lemma IIV.3I implies that 
whenever e < p the bound 



1 — r v 1 — r ' 



(37) 
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is also obtained where r is given by d32| i. In example that fol- 
lows, in order to estimate err(iV) we work with an additional 
assumption that e < p. 

1) Example 1: In the example we let q = 2, e = 0.01. The 
transition matrix we use is 

/0.85 0.15\ 
^0.28 0.72 

Table Q] shows the estimate entropy rate Hn{h) using the 
formula given by theorem IV. 2 1 Figure [2] shows the support 
of the measure <fi The value of e is chosen to be 0.2 so as to 
make the plot clearly visible. 



N 


H N (p) 


err(N) bound 


10 


0.71399868740464 


15.6656 


20 


0.70277846315804 


3.4068 


30 


0.70083402087899 


0.7408 


40 


0.70045844593354 


0.1611 


50 


0.70038443295765 


0.0350 


60 


0.70036979023825 


0.0076 


70 


0.70036689107994 


0.0016 


80 


0.70036631697843 


3.6038 x 10"* 


90 


0.70036620328938 


7.837 x 10~ s 


100 


0.70036618077546 


1.7044 x 10" s 



TABLE I 

List of values of entropy rate for q = 2 



N 




git(N) bound 


10 


0.95961052113515 


0.3561 


20 


0.95961126155225 


0.0030 


30 


0.95961126164043 


2.6758 x 10- b 


40 


0.95961126164044 


2.3193 x 10"' 


50 


0.95961126164044 


2.0103 x 10" y 



TABLE II 

List of values of entropy rate for q = 3 



0.9596 r 
0.9596 - 
0.9596 - 
0.9596 - 
0.9596 - 
0.9596 - 
0.9596 - 
0.9596 - 
0.9596 - 



10 15 20 25 30 35 40 45 50 

N 

Fig. 3. Plot of Entropy rate H^(ii) computed using new formulas versus 
N. The estimate converges very quickly with N. 



0.1 0.2 



Fig. 2. The support of the measure 0. It is seen that the support converges 
to accumulation point f. 



2) Example 2: In the example we let q = 
62 = 0.02. The transition matrix we use is 



3, ei = 0.01 and 



0.4 0.25 0.35> 
E = | 0.25 0.45 0.3 
0.2 0.55 0.25 y 

Table [TT] shows the estimate entropy rate Hn{h) using the 
formula given by theorem IV. 2 1 Figure [3] show a plot of the 
entropy rate Hm (/i) versus TV. It is seen that the formulas for 
Hn(h) converge very quickly with N. Figure |4] shows the plot 
of the bound on err(TV) given by equation (|37| i versus N. It 
can be seen that we get a very good bound on err(TV) within 
a few terms. 



Fig. 4. Plot of the error err(A r ) versus N. A reasonable bound on the error 
is obtained with fewer than 50 terms of the sum. 
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